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ABSTRACT 

Scents are well known to be emitted from flowers 
and animals. In nature, these volatiles are respon- 
sible for inter- and intra-organismic communication, 
e.g. attraction and defence. Consequently, they in- 
fluence and improve the establishment of organisms 
and populations in ecological niches by acting as 
single compounds or in mixtures. Despite the 
known wealth of volatile organic compounds 
(VOCs) from species of the plant and animal 
kingdom, in the past, less attention has been 
focused on volatiles of microorganisms. Although 
fast and affordable sequencing methods facilitate 
the detection of microbial diseases, however, the 
analysis of signature or fingerprint volatiles will be 
faster and easier. Microbial VOCs (mVOCs) are pres- 
ently used as marker to detect human diseases, 
food spoilage or moulds in houses. Furthermore, 
mVOCs exhibited antagonistic potential against 
pathogens in vitro, but their biological roles in the 
ecosystems remain to be investigated. Information 
on volatile emission from bacteria and fungi is pres- 
ently scattered in the literature, and no public and 
up-to-date collection on mVOCs is available. To 
address this need, we have developed mVOC, a 
database available online at http://bioinformatics. 
charite.de/mvoc. 

INTRODUCTION 

Microorganisms are universal in the biosphere. They are 
often found in large quantities and diverse compositions 
(microbiome). For example, there are more microorgan- 
isms (~2kg) than human cells in humans, and most of 
them are essential and useful for the human host vitality 
(1). Bacteria are also dominant inhabitants of the leaf 



surfaces (10 7 cells/cm 2 ) and they are prominent in the 
soil, e.g. 1 g of soil contains ~10 1 microbial cells (2). 

It is well known that microbes produce a diversity of 
natural compounds, e.g. antibiotics. Interestingly, the 
small molecular mass substances released by microorgan- 
isms were often overlooked, partially due to the lack of 
appropriate absorption and detection technologies. Many 
of these small molecules (<300Da) exhibit high-vapour 
pressures and low boiling points, and, together with a 
lipophilic character, these features support volatility. 

In the past decade, research on microbial smells 
experienced a renaissance owing to their global appearance. 
Some examples of bacterial volatile emissions are men- 
tioned here. Undoubtedly, prominent malodorous volatiles 
are produced by microorganisms during the process of pu- 
trefaction (e.g. amines, sulphur compounds, indole and 
ammonia) (3), whereas the aromas of wines, sauerkraut, 
cheese and other milk product fermentations are usually 
recognized as pleasant by human noses (e.g. acids, alcohols 
and esters). The earthy and muddy smell of wet forest soils 
is due to the emission of the volatile geosmin released by 
Streptomyces species (4-6). Microbiologists typically rec- 
ognize the characteristic smell of indole from Escherichia 
coli. The human microbial flora at any given anatomical site 
is relatively specifically accompanied by a typical volatile 
organic compound (VOC) profile (e.g. oral and breath 
malodour, smell of sputum VOCs, gases released by the 
gut, sweat and sebum smell and foot odour). The VOC 
mixture of breath originates from more than one source 
within the respiratory system (e.g. tongue, oropharynx 
and bronchioles), and respiratory disorders can result in 
odorous gases being expelled into the air, which can be 
useful for diagnostic purposes (3). For example, to detect 
Mycobacterium tuberculosis, methyl nicotinate showed 
promising results to be used as a non-invasive and rapid 
diagnostic tool, or the emission of 2-nonanone of 
Pseudomonas aeruginosa VOCs may be used as in vivo 
marker to detect lung infections (7). Freshly secreted 
sweat is sterile, but due to biotransformation by 
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microorganisms (aerobic coryneforms, propionibacteria 
and Micrococcaceae), odoriferous VOCs are produced. 
Another example is wound infection: when a wound is 
formed, it is a new ecological niche favourable to microbial 
growth. Research is ongoing to develop a 'wound sniffing' 
device that discriminates between wound-related and 
-unrelated volatiles. Volatiles, particularly off-flavours, 
are also fingerprints to screen systematically for spoiled 
foodstuff or to identify hidden microbial growth in build- 
ings (8,9). Furthermore, in the past few years, the interest of 
researchers studying the effects of microbial VOCs 
(mVOCs) on plants has become increasingly evident, 
showing that some rhizobacteria release a blend of 
volatile components that promote growth of Arabidopsis 
thaliana (10), whereas others inhibit or are toxic for 
plants (11-14). It is also well documented that mVOCs 
are important factors in mediating specific microbial inter- 
actions; in fact, intra- or inter-species interactions between 
bacteria and fungi in the soil result in morphological and 
phenotypical alterations of the receiving organism (15). 
There are numerous instances of mVOCs being closely 
associated with insect feeding behaviours, and other micro- 
bial volatiles are also known as powerful repellants (16). 
Moreover, some volatile compounds such as higher 
alcohols (2-methyl-l-butanol, 3-methyl-l-butanol and 
isobutanol) can be used as biofuels (17); however, consider- 
ing the fact that the natural microbial production rates are 
too low to support industrial production, metabolic engin- 
eering is widely used to improve the production (17,18, 
http://dx.doi.org/10.5772/52050). 

These are just a few examples of the up to now known 
349 bacteria and 69 fungi that are volatile emitters. Taking 
in considerations that to date ~ 10 000 microbial species 
are described and at least a million are expected to exist on 
earth, the VOC profiles of a surprisingly small number of 
microorganisms were investigated so far. The VOC 
spectra of microorganisms are species-specific and can be 
simple or complex (19). The qualitative and quantitative 
composition of an mVOC profile is variable, depending on 
growth conditions (temperature, oxygen availability, pH), 
the carbon source availability and the age of the culture 
(3,20-24). Ultimately, the volatile emission profile is a 
consequence of specific metabolic activities of the particu- 
lar microorganism. 

Considering the importance and the central roles of 
mVOCs in our biosphere, our objective was the establish- 
ment of a database of microbial volatiles for public use. 
Here, we present for the first time a user-friendly compil- 
ation of the microbial volatiles extracted from the litera- 
ture. Originally, only mVOCs were filed using the 
Pubchem ID as an essential criterion. During the literature 
search, it turned out that many mVOCs have not received 
a Pubchem ID, but might be biologically relevant. 
Therefore, these compounds were also included into the 
database. Mixtures of mVOCs are composed of various 
chemical classes, e.g. low molecular weight fatty acids and 
their derivatives (hydrocarbons, alcohols, aldehydes and 
ketones), terpenoids, aromatic compounds, nitrogen con- 
taining compounds and volatile sulphur compounds 
(15,25). To date, ~1000 volatiles are filed in the mVOC 
database. References are given to each bacterial and 



fungal strain or isolate presented in the database. The 
user interface offers several search options, for instance, 
by species name, Pubchem ID, structure, molecular weight 
and logP value. Online upload is possible, allowing a 
timely incorporation of new data sets, which is expected 
to happen progressively in this fast-growing research field. 



MATERIALS AND METHODS 

The data were acquired by an extensive literature search 
available on PubMed (http://www.ncbi.nlm.nih.gov/ 
pubmed). Most of the information on mVOCs was found 
in ~20 journals and the full-text of ~100 articles was 
yielding most of the results compiled in the mVOC 
database. The literature was manually screened by biochem- 
ists. To update mVOC in the future, literature will continu- 
ously be screened and data will be checked by a chemist or 
biochemist before entering them manually to the database. 

For conducting the similarity search of the compounds, 
the chemoinformatics package MyChem (http://mychem. 
sourceforge.net/) is integrated into the database. It enables 
the analysis and conversion of chemical data using 
OpenBabel (http://openbabel.org/) functionality. For the 
purpose of calculating the Tanimoto coefficient (26), it is 
obligatory to assign fingerprints to the compounds. This 
step is also performed by the MyChem package. Thereto, 
OpenBabel uses the Daylight theory for fingerprints 
(http://www.daylight.com/dayhtml/doc/theory/theory. 
finger.html). For the similarity determination between the 
compound of interest and the compounds of the mVOC 
database, the Tanimoto coefficient is the measure of 
choice: 

Tanimoto coefficient 4 B = — — ^ , - 

A+B — AB 

Bits of the binary fingerprint vectors were set to one in 
compound A and compound B as well as bits were set to 
one in both compounds and used for the calculation. The 
values calculated by the Tanimoto coefficient range 
between 0 and 1, where 1 indicates similar structures 
and 0 means that no similarity is found between the fin- 
gerprint representations of the molecules. Referring to the 
'similarity property principle' (27), compounds that are 
structurally similar should exhibit a similar biological 
function. Nevertheless, small structural modifications 
can change the biological activity of the molecules dramat- 
ically (28). However, a Tanimoto coefficient >0.85 implies 
that the compared compounds may have a similar biolo- 
gical activity (29). 

As an applet for sketching compounds for the 'Structure 
Search' and 'Add a new mVOC function, the open-source 
web-component ChemDoodle is implemented on the 
mVOC website. ChemDoodle is also used for a 3D visual- 
ization of the mVOC structure. ChemDoodle guarantees 
smooth usage on different platforms. 

For retrieving Kyoto Encyclopedia of Genes and 
Genomes (KEGG) pathways, a similarity search between 
the 846 mVOCs and the 200000 compounds from the 
SuperTarget database (30) was carried out. This step was 
conducted to obtain information on synthesis/degradation 
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of the mVOCs as well as potential target interactions. 
Compounds were considered when the Tanimoto coeffi- 
cient of the similarity pair between volatile and 
SuperTarget compound was at least 0.85. Those com- 
pounds were mapped onto the pathway maps. 
Additionally, mVOCs were mapped to KEGG compounds 
and are also displayed on the pathway maps. The database 
features pathways of species available in the database. The 
mapped pathways are visualized by web service. 

The mVOC database is implemented as a relational 
database on a MySQL server. Php and javascript have 
been used to build the website. Web access is enabled by 
Apache HTTP Server 2. 



RESULTS 

With a number of 846 compounds and 5431 synonyms, 
which are assigned to 349 bacterial and 69 fungi species, 
mVOC is the first online database containing information 
about mVOCs and their emitting organisms. 

Search options 

The database provides several possibilities to search for 
compounds (Figure 1). On the one hand, a form 
('browse mVOC) is available, and the user can choose 
to search by PubChem-ID, name or molecular formula. 
In addition, compounds can be searched by selecting a 
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Figure 1. The mVOC database offers different search options. mVOC search: a general search form for mVOCs based on PubChem ID, name, 
several molecular properties as well as species. The result table is directly retrieved. Structure search: interactively drawing a structure and performing 
a structure or substructure search. The result table shows volatile compounds similar to the search entry (similarity search) or volatile compounds 
including a substructure that is similar to the search entry (substructure search). By clicking on Information', one will be directed to the result table 
of the mVOC. Signatures: the signature table shows all species emitting the same compounds as the chosen species. Compounds emitted by just one 
species are highlighted in green. KEGG pathways: cutout of the 2-oxocarboxylic acid metabolism pathway. 
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range of properties like molecular weight, logP values or 
by specific chemical groups (e.g. alkenes), by species, by 
the microorganisms' kingdom and by a combination of 
these parameters. Furthermore, compounds can be 
searched by structural similarity ('Structure Search') 
including substructure search. For this purpose, a molecu- 
lar structure is necessary to be known. The compound of 
interest is screened against the database by calculating the 
Tanimoto coefficient between the composed compound 
and all compounds of the mVOC database. Finally, the 
user can draw a compound with the embedded 
ChemDoodle interface (http://www.chemdoodle.com) or 
upload its MOL file. In addition to that, a possibility of 
browsing the mVOC database is given under the form 
'browse mVOC. The database can be browsed by initial 
letters or chemical groups. 

Search results 

The resulting report of 'browse mVOC shows informa- 
tion about the compounds including name, synonyms, 
Pubchem-ID and structural information of the mVOC 
of interest (Figure 1). Additionally, microorganisms 
emitting the compound, the effect of the compound on 
other organisms, the respective methods for retrieving 
the compounds and the corresponding references are dis- 
played. The search results for the 'Structure Search' are 
represented in order of similarity with information given 
about the calculated Tanimoto coefficient, PubChem-ID, 
name and 2D structure. The button 'Information' 
provides detailed information about the mVOCs that are 
similar to the query compound. 

Biological interpretation 

The website features the use of KEGG pathway maps 
(http://www.kegg.jp/) through Web service. KEGG 
pathway maps supply knowledge about metabolic 
pathways as well as compound target interactions and 
offer a possibility for biological interpretation (Figure 1). 
Compounds of the mVOC database are mapped onto the 
pathways showing information about metabolic pathways 
providing an opportunity for further analysis. Moreover, 
an investigation of medical effects is also possible. A link to 
gene or gene clusters responsible for mVOC production 
will be included in future versions of this database. 

Another important feature is a 'signature table' of an 
organism of choice (Figure 1). After selecting a species 
from the bacterial or fungal species dropdown menu 
from 'browse mVOC, a 'signature' button is available 
on top of the result page. The 'signature table' plots the 
emitted mVOCs of the chosen species compared with all 
microbial species, which emit these mVOCs. The table 
shows the uniqueness of the compounds, which is, for 
example, important for distinguishing between (more or 
less pathogenic) species. 

Database extension 

To enlarge the mVOC database, an upload function 'Add 
new mVOCs' is included (Figure 1). The user can upload a 
compound by drawing its structure with the ChemDoodle 
application. After uploading the compound, it will be 



verified by biochemists, and after being proofed as 
mVOC, it will be included into the database. Users are 
also encouraged to contact the authors when new volatile 
spectra are ready to be uploaded. 

DISCUSSION 

Microbes make up the majority of the world's biomass; 
their numbers and diversity greatly surpass those of all 
other organisms (31). Microbial chemical ecology is an 
important part of our life, and the analysis of the 
microbiome and unravelling its physiology including 
immune response, metabolism as well as pathology are 
future goals (32). Although sequencing becomes cheaper, 
an analysis of volatiles will always be faster and less 
invasive. Therefore, the mVOC database is an indispens- 
able platform for this burgeoning field of microbial vola- 
tiles. Interest is particularly focussed on the identification 
of 'signature volatiles' of human, animal and plant patho- 
genic species (33). Based on these results, new possibilities 
for using diagnostic tools can be considered. The applica- 
tion of volatile antibiotics can also be envisioned. 
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